Computing has undergone a fundamental shift from latency-optimized CPU design to throughput-oriented GPU architectures. While a CPU is like a high-speed delivery motorcycle (fast for one package), a GPU is a massive cargo ship: it moves slower per item but carries 50,000 containers at once.
1. Latency vs. Throughput
CPUs are engineered to minimize the "time-to-completion" for a single sequence of instructions using sophisticated branch prediction. Conversely, Graphics Processing Units (GPUs) are designed to maximize "work-per-second" by executing thousands of threads in parallel, trading off single-thread speed for massive aggregate throughput.
2. Transistor Allocation
A GPU provides much higher instruction throughput and memory bandwidth than a CPU within a similar price and power envelope. GPUs are specialized for highly parallel computations and devote more transistors to data processing units (ALUs), while CPUs dedicate more transistors to data caching and flow control.
3. The Evolution of CUDA
Compute Unified Device Architecture (CUDA) was introduced by NVIDIA in 2006. It is a parallel computing platform and programming model that enables dramatic increases in performance by harnessing the power of the GPU independent of graphics APIs.